Massive Data Discrimination via Linear Support Vector Machines

نویسندگان

  • P. S. Bradley
  • O. L. Mangasarian
چکیده

Abst ract A linear support vector machine formulation is used to generate a fast, nitely-terminating linear-programming algorithm for discriminating between two massive sets in n-dimensional space, where the number of points can be orders of magnitude larger than n. The algorithm creates a succession of suuciently small linear programs that separate chunks of the data at a time. The key idea is that a small number of support vectors, corresponding to linear programming constraints with positive dual variables, are carried over between the successive small linear programs, each of which containing a chunk of the data. We prove that this procedure is monotonic and terminates in a nite number of steps at an exact solution that leads to an optimal separating plane for the entire dataset. Numerical results on fully dense publicly available datasets, numbering 20,000 to 1 million points in 32-dimensional space, connrm the theoretical results and demonstrate the ability to handle very large problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Discrimination via Nonlinear Generalized Support Vector Machines

The main purpose of this paper is to show that new formulations of support vector machines can generate nonlinear separating surfaces which can discriminate between elements of a given set better than a linear surface. The principal approach used is that of generalized support vector machines (GSVMs) which employ possibly indeenite kernels 17]. The GSVM training procedure is carried out by eith...

متن کامل

A prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)

Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...

متن کامل

A prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)

Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...

متن کامل

Massive Data Classification via Unconstrained Support Vector Machines

A highly accurate algorithm, based on support vector machines formulated as linear programs [13, 1], is proposed here as a completely unconstrained minimization problem [15]. Combined with a chunking procedure [2] this approach, which requires nothing more complex than a linear equation solver, leads to a simple and accurate method for classifying million-point datasets. Because a 1-norm suppor...

متن کامل

Algorithms for Sparse Linear Classifiers in the Massive Data Setting

Classifiers favoring sparse solutions, such as support vector machines, relevance vector machines, LASSO-regression based classifiers, etc., provide competitive methods for classification problems in high dimensions. However, current algorithms for training sparse classifiers typically scale quite unfavorably with respect to the number of training examples. This paper proposes online and multi-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998